Cross-Modality Transformer With Modality Mining for Visible-Infrared Person Re-Identification
نویسندگان
چکیده
Visible-infrared cross-modality person re-identification is a challenging ReID task, which aims to retrieve and match the same identity's images between heterogeneous visible infrared modalities. Thus, core of this task bridge huge gap these two The existing convolutional neural network-based methods mainly face problem insufficient perception modalities' information, can not learn good discriminative modality-invariant embeddings for identities, limits their performance. To solve problems, we propose transformer-based method (CMTR) visible-infrared explicitly mine information each modality generate better features based on it. Specifically, capture characteristics, design novel embeddings, are fused with token encode information. Furthermore, enhance representation adjust matching embeddings' distribution, modality-aware enhancement loss learned reducing intra-class distance enlarging inter-class distance. our knowledge, first work applying transformer network task. We implement extensive experiments public SYSU-MM01 RegDB datasets, proposed CMTR model's performance significantly surpasses outstanding CNN-based methods.
منابع مشابه
Supplementary Material for “RGB-Infrared Cross-Modality Person Re-Identification”
This supplementary material accompanies the paper “RGB-Infrared Cross-Modality Person Re-Identification”. It includes more details of Section 4, as well as extra evaluations of our proposed deep zero-padding method. 1. Details of Counting Domain-Specific Nodes In the third paragraph of Section 4.2 in the main manuscript, we quantify the number of domain-specific nodes in the trained network in ...
متن کاملCross Dataset Person Re-identification
Until now, most existing researches on person re-identification aim at improving the recognition rate on single dataset setting. The training data and testing data of these methods are form the same source. Although they have obtained high recognition rate in experiments, they usually perform poorly in practical applications. In this paper, we focus on the cross dataset person re-identification...
متن کاملHierarchical Cross Network for Person Re-identification
Person re-identification (person re-ID) aims at matching target person(s) grabbed from different and non-overlapping camera views. It plays an important role for public safety and has application in various tasks such as, human retrieval, human tracking, and activity analysis. In this paper, we propose a new network architecture called Hierarchical Cross Network (HCN) to perform person re-ID. I...
متن کاملDiscriminating Visible Speech Tokens Using Multi-modality
We present a multimodal interactive data exploration tool that facilitates discrimination between visible speech tokens. The multimodal tool uses visualization and sonification (non-speech sound) of data. Visible speech tokens is a class of multidimensional data that have been used extensively in designing talking head that has been used in training of deaf individuals by watching speech [1]. V...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Multimedia
سال: 2023
ISSN: ['1520-9210', '1941-0077']
DOI: https://doi.org/10.1109/tmm.2023.3237155